Sparse Linear Discriminant Analysis with Applications to High Dimensional Low Sample Size Data
نویسندگان
چکیده
This paper develops a method for automatically incorporating variable selection in Fisher’s linear discriminant analysis (LDA). Utilizing the connection of Fisher’s LDA and a generalized eigenvalue problem, our approach applies the method of regularization to obtain sparse linear discriminant vectors, where “sparse” means that the discriminant vectors have only a small number of nonzero components. Our sparse LDA procedure is especially effective in the so-called high dimensional, low sample size (HDLSS) settings, where LDA possesses the “data piling” property, that is, it maps all points from the same class in the training data to a common point, and so when viewed along the LDA projection directions, the data are piled up. Data piling indicates overfitting and usually results in poor out-of-sample classification. By incorporating variable selection, the sparse LDA overcomes the data piling problem. The underlying assumption is that, among the large number of variables there are many irrelevant or redundant variables for the purpose of classification. By using only important or significant variables we essentially deal with a lower dimensional problem. Both synthetic and real data sets are used to illustrate the proposed method.
منابع مشابه
Supervised Feature Extraction of Face Images for Improvement of Recognition Accuracy
Dimensionality reduction methods transform or select a low dimensional feature space to efficiently represent the original high dimensional feature space of data. Feature reduction techniques are an important step in many pattern recognition problems in different fields especially in analyzing of high dimensional data. Hyperspectral images are acquired by remote sensors and human face images ar...
متن کاملSparse Linear Discriminant Analysis by Thresholding for High Dimensional Data
In many social, economical, biological, and medical studies, one objective is to classify a subject into one of several classes based on a set of variables observed from the subject. Because the probability distribution of the variables is usually unknown, the rule of classification is constructed using a training sample. The well-known linear discriminant analysis (LDA) works well for the situ...
متن کاملSparse Linear Discriminant Analysis by Thresholding for High Dimensional Data By
In many social, economical, biological and medical studies, one objective is to classify a subject into one of several classes based on a set of variables observed from the subject. Because the probability distribution of the variables is usually unknown, the rule of classification is constructed using a training sample. The well-known linear discriminant analysis (LDA) works well for the situa...
متن کاملA direct approach to sparse discriminant analysis in ultra-high dimensions
Sparse discriminant methods based on independence rules, such as the nearest shrunken centroids classifier (Tibshirani et al., 2002) and features annealed independence rules (Fan & Fan, 2008), have been proposed as computationally attractive tools for feature selection and classification with high-dimensional data. A fundamental drawback of these rules is that they ignore correlations among fea...
متن کاملSparse semiparametric discriminant analysis
In recent years, a considerable amount of work has been devoted to generalizing linear discriminant analysis to overcome its incompetence for high-dimensional classification (Witten and Tibshirani, 2011, Cai and Liu, 2011, Mai et al., 2012 and Fan et al., 2012). In this paper, we develop high-dimensional sparse semiparametric discriminant analysis (SSDA) that generalizes the normal-theory discr...
متن کامل